24 research outputs found
Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem
In this paper we present a framework to analyze the asymptotic behavior of
two timescale stochastic approximation algorithms including those with
set-valued mean fields. This paper builds on the works of Borkar and Perkins &
Leslie. The framework presented herein is more general as compared to the
synchronous two timescale framework of Perkins \& Leslie, however the
assumptions involved are easily verifiable. As an application, we use this
framework to analyze the two timescale stochastic approximation algorithm
corresponding to the Lagrangian dual problem in optimization theory
Rainbow Connection Number and Radius
The rainbow connection number, rc(G), of a connected graph G is the minimum
number of colours needed to colour its edges, so that every pair of its
vertices is connected by at least one path in which no two edges are coloured
the same. In this note we show that for every bridgeless graph G with radius r,
rc(G) <= r(r + 2). We demonstrate that this bound is the best possible for
rc(G) as a function of r, not just for bridgeless graphs, but also for graphs
of any stronger connectivity. It may be noted that for a general 1-connected
graph G, rc(G) can be arbitrarily larger than its radius (Star graph for
instance). We further show that for every bridgeless graph G with radius r and
chordality (size of a largest induced cycle) k, rc(G) <= rk.
It is known that computing rc(G) is NP-Hard [Chakraborty et al., 2009]. Here,
we present a (r+3)-factor approximation algorithm which runs in O(nm) time and
a (d+3)-factor approximation algorithm which runs in O(dm) time to rainbow
colour any connected graph G on n vertices, with m edges, diameter d and radius
r.Comment: Revised preprint with an extra section on an approximation algorithm.
arXiv admin note: text overlap with arXiv:1101.574
3DPG: Distributed Deep Deterministic Policy Gradient Algorithms for Networked Multi-Agent Systems
We present Distributed Deep Deterministic Policy Gradient (3DPG), a
multi-agent actor-critic (MAAC) algorithm for Markov games. Unlike previous
MAAC algorithms, 3DPG is fully distributed during both training and deployment.
3DPG agents calculate local policy gradients based on the most recently
available local data (states, actions) and local policies of other agents.
During training, this information is exchanged using a potentially lossy and
delaying communication network. The network therefore induces Age of
Information (AoI) for data and policies. We prove the asymptotic convergence of
3DPG even in the presence of potentially unbounded Age of Information (AoI).
This provides an important step towards practical online and distributed
multi-agent learning since 3DPG does not assume information to be available
deterministically. We analyze 3DPG in the presence of policy and data transfer
under mild practical assumptions. Our analysis shows that 3DPG agents converge
to a local Nash equilibrium of Markov games in terms of utility functions
expressed as the expected value of the agents local approximate action-value
functions (Q-functions). The expectations of the local Q-functions are with
respect to limiting distributions over the global state-action space shaped by
the agents' accumulated local experiences. Our results also shed light on the
policies obtained by general MAAC algorithms. We show through a heuristic
argument and numerical experiments that 3DPG improves convergence over previous
MAAC algorithms that use old actions instead of old policies during training.
Further, we show that 3DPG is robust to AoI; it learns competitive policies
even with large AoI and low data availability
Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber-Physical Systems
In many Cyber-Physical Systems, we encounter the problem of remote state
estimation of geographically distributed and remote physical processes. This
paper studies the scheduling of sensor transmissions to estimate the states of
multiple remote, dynamic processes. Information from the different sensors have
to be transmitted to a central gateway over a wireless network for monitoring
purposes, where typically fewer wireless channels are available than there are
processes to be monitored. For effective estimation at the gateway, the sensors
need to be scheduled appropriately, i.e., at each time instant one needs to
decide which sensors have network access and which ones do not. To address this
scheduling problem, we formulate an associated Markov decision process (MDP).
This MDP is then solved using a Deep Q-Network, a recent deep reinforcement
learning algorithm that is at once scalable and model-free. We compare our
scheduling algorithm to popular scheduling algorithms such as round-robin and
reduced-waiting-time, among others. Our algorithm is shown to significantly
outperform these algorithms for many example scenarios